Skip to content

Add workspace scanning to oak_scan and wire it to the LSP#1244

Open
lionel- wants to merge 30 commits into
mainfrom
oak/salsa-12-scan-workspace
Open

Add workspace scanning to oak_scan and wire it to the LSP#1244
lionel- wants to merge 30 commits into
mainfrom
oak/salsa-12-scan-workspace

Conversation

@lionel-

@lionel- lionel- commented May 29, 2026

Copy link
Copy Markdown
Contributor

Branched from #1243
Progress towards #1212

This PR adds workspace scanning to oak_scan and wires it into the LSP. The scanner walks a workspace folder, finds packages (any DESCRIPTION file in the tree with a Package field, honoring .gitignore) and top-level R scripts, and registers them under a Root in oak_db.

The LSP feeds three event types into the scanner:

  • initialize and didChangeWorkspaceFolders update workspace roots in the DB.
  • didChangeWatchedFiles applies surgical updates for single R files. When a change to a DESCRIPTION file is detected, this triggers a full rescan of the containing root (package files might be demoted to scripts).

Editor-owned URLs (files the editor has open) get special handling because we ignore disk state for those. The editor contents are the source of truth. When a workspace is removed, these files survive and get placed in OrphanRoot. This way analysis keeps working when a user closes a workspace folder while a buffer from it is still open.

Files that leave a live workspace (folder removed, file deleted, buffer closed) route to StaleRoot. We keep them there for as long as they remain deleted, and we bring them back if they are added again. This allows reusing the File inputs without creating new ones. Salsa inputs are never garbage collected so reusing them (based on their URL key) avoids unbounded memory growth when e.g. switching git branches, which can delete and bring back many files at a time.

Workspace scans run synchronously in the main loop of the LSP in this PR. Since these operations are too heavy to be run synchronously, and would stall requests from user interactions, the next PR moves
them off the main loop.

@lionel- lionel- force-pushed the oak/salsa-12-scan-workspace branch from 48eb900 to a29744e Compare May 29, 2026 13:51
Comment thread crates/ark/src/lsp/state_handlers.rs Outdated
// Try to load package from this workspace folder and set as
// root if found. This means we're dealing with a package
// source.
if state.root.is_none() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we eventually going to remove this root field? Probably right? It's not really correct for multi root workspaces anyways, and I feel like we will have this stored elsewhere

@lionel- lionel- Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we're heading towards a single notion of workspace root that can contain any number of scripts and packages. This way mono-repos are properly supported.

Comment thread crates/ark/src/lsp/state.rs Outdated
pub(crate) library: Library,

/// Salsa input tree for Oak queries.
pub(crate) oak: OakDatabase,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to strongly advocate for db naming scheme like everywhere else!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to strongly agree with you!

Comment thread crates/ark/src/lsp/state_handlers.rs Outdated
Comment on lines +142 to +152
// Start first round of indexing. `state.documents` is empty at init since
// no `didOpen` has fired yet, but build the set through the same shape we
// use elsewhere so the call site reads consistently.
let editor_owned: HashSet<UrlId> = state
.documents
.keys()
.map(|url| UrlId::from_url(url.clone()))
.collect();
state
.oak
.set_workspace_paths(&workspace_paths, &editor_owned);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just seeing something like

// Start first round of indexing. We are initializing, so no documents have been opened yet.
set_workspace_paths(&workspace_paths, &HashSet::new())

is less confusing

Comment thread crates/ark/src/lsp/state_handlers.rs Outdated
Comment thread crates/oak_scan/src/watch.rs Outdated
Comment on lines +128 to +129
/// every package. Mirrors the placement the bulk scanner would pick.
pub(crate) fn add_watched_file<DB: Db + DbInputs>(db: &mut DB, url: UrlId, contents: String) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mildly nervous about the fact that we have 2 implementations of file classification. One for bulk updates and one for incremental ish updates. It seems likely they could easily get out of sync :/

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've extracted a placement classifier to prevent some of the potential drift

Comment thread crates/oak_scan/src/watch.rs Outdated
Comment on lines +157 to +175
let mut scripts = root.scripts(db).clone();
if !scripts.contains(&file) {
scripts.push(file);
root.set_scripts(db).to(scripts);
}
},
Placement::PackageFile(pkg) => {
let mut files = pkg.files(db).clone();
if !files.contains(&file) {
files.push(file);
pkg.set_files(db).to(files);
}
},
Placement::PackageScript(pkg) => {
let mut scripts = pkg.scripts(db).clone();
if !scripts.contains(&file) {
scripts.push(file);
pkg.set_scripts(db).to(scripts);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for all 3, only clone if !contains

Comment thread crates/oak_scan/src/watch.rs
Comment thread crates/oak_scan/src/watch.rs Outdated
@lionel-

lionel- commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

#1243 (comment)

@lionel- lionel- force-pushed the oak/salsa-11-scan-library branch from d50f230 to 76a82b0 Compare June 10, 2026 10:07
Base automatically changed from oak/salsa-11-scan-library to main June 10, 2026 10:11
@lionel- lionel- force-pushed the oak/salsa-12-scan-workspace branch from a29744e to eec3212 Compare June 10, 2026 10:12
@lionel- lionel- force-pushed the oak/salsa-12-scan-workspace branch from 1132f3d to 779653a Compare June 10, 2026 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants